BMC Genomics
Top medRxiv preprints most likely to be published in this journal, ranked by match strength.
Show abstract
ImportanceGenome-wide association studies have identified hundreds of common single nucleotide polymorphisms (SNPs) and small insertions/deletions (indels) associated with primary open-angle glaucoma (POAG) risk, though these variants have modest effect sizes and individually may have minor contributions to disease development. As whole-genome sequencing data is becoming more readily available, structural variants and other complex genomic features can be interrogated for contribution to disease...
Show abstract
Primary open-angle glaucoma (POAG) disproportionately affects individuals of African ancestry, yet rare coding variation in this population remains understudied. To address this gap, we performed a multi-cohort exome-wide meta-analysis across POAAGG, PMBB, All of Us, and UK Biobank, including 4,815 POAG cases and 22,922 controls of genetically inferred African ancestry. Although no gene reached exome-wide significance, we identified several suggestive gene-level associations driven by rare varia...
Show abstract
IntroductionSequence-based typing (SBT) has been the standard molecular typing method for understanding Legionella pneumophila genetic relationships. However, genome-scale typing approaches, namely core-genome (cg) or whole-genome (wg) multilocus sequence typing (MLST), provide higher discriminatory power. To advance these capabilities, the Legionella International Typing (LIT) workgroup was established to develop, evaluate, and disseminate a novel cgMLST schema with enhanced wgMLST resolution f...
Show abstract
PurposeFuchs endothelial corneal dystrophy (FECD) is a common corneal disease and a leading indication for endothelial keratoplasty (EK). Although CTG18.1 repeat expansion is a major genetic risk factor, the contribution of polygenic background to disease progression remains unclear. We evaluated whether combining CTG18.1 expansion status with a FECD-specific polygenic risk score (PRS) enables genomic prediction of progression to EK. MethodsWe retrospectively analysed 589 individuals with FECD ...
Show abstract
BackgroundThe hyperpolymorphic nature and structural complexity of the human leukocyte antigen (HLA) genomic region present challenges for accurate and scalable typing across diverse sample types. While wholegenome sequencing (WGS) offers the opportunity to infer HLA genotypes without targeted enrichment, systematic benchmarks across sequencing platforms, biospecimens and coverage levels remain limited. ResultsWe assembled a multi-platform resource of WGS datasets derived from short-read (Illum...
Show abstract
Household transmission of EV-D68 was identified in 35 of 1040 households (3.4%) in the Pacific Northwest between 2022-2024, with an estimated secondary attack rate of 15%. Sequences from within households clustered closely with 0 to 2 pairwise nucleotide differences (median 1) between cases 6-14 days apart (median 7).
Show abstract
Genomic surveillance of influenza viruses informs vaccine strain selection and evolutionary forecasting. Sequencing efforts vary widely across U.S. states, which raises concerns about spatial sampling bias. We evaluated how well 10,958 influenza virus genomes sampled by our group in Michigan captured the genetic diversity in 34,743 genomes circulating nationally from the 2021/22 through 2024/25 seasons. We defined seasonal hemagglutinin haplotypes and tracked their detection across states. A sma...
Show abstract
ObjectiveTo identify risk loci for Fuchs endothelial corneal dystrophy (FECD) and improve a genetic risk prediction model. DesignGenome-wide association study (GWAS), polygenic risk score (PRS) construction, and TCF4 CTG18.1 short tandem repeat (STR) length inference. ParticipantsThe study included 7,316 Europeans (EUR) with FECD or related corneal dystrophy phenotypes and 1,588,467 controls from the UK Biobank, All of Us, FinnGen, and the Million Veteran Program. Two independent EUR FECD coho...
Show abstract
Methods that analyze single-cell RNA-seq+ATAC-seq multiome data have shown promise in linking enhancers to target genes by correlating chromatin accessibility with gene expression across cells. However, correlations among ATAC-seq peaks may induce non-causal tagging peak-gene links (analogous to tagging associations in GWAS); indeed, we confirm that tagging effects induced by peak co-accessibility are pervasive in peak-gene linking. We defined two scores for each ATAC-seq peak: co-accessibility ...
Show abstract
Applying deep learning models to RNA-Seq data poses substantial challenges, primarily due to the high dimensionality of the data and the limited sample sizes. To address these issues, this study introduces an advanced deep learning pipeline that integrates feature engineering with data augmentation. The engineering application focuses on biomedical engineering, specifically the classification of RNA-Seq datasets for disease diagnosis. The proposed framework was initially validated on synthetic d...
Show abstract
The Clinical Pharmacogenetics Implementation Consortium (CPIC) bases its drug-gene recommendations on the assignment of star alleles, which map known genotypes to defined functional categories and corresponding drug dosage guidelines. The star allele framework, first proposed in 1996 for the CYP gene family and later formalized with CPICs establishment in 2010 [1, 2], remains foundational to pharmacogenomics. However, this system has notable limitations. Its dependence on a restricted set of ben...
Show abstract
MotivationFanconi anemia (FA) is a rare disease mainly caused by biallelic pathogenic variants, including structural variants such as large deletions and insertions in FA genes. Currently, variant detection is based on short-read sequencing and probe-based approaches. However, determining the exact genomic breakpoint or achieving allelic discrimination remains challenging. Nanopore-based long-read sequencing enables a comprehensive detection of FA variants, but a unified bioinformatic analysis p...
Show abstract
Somatic mutations and the tumor immune microenvironment in breast tumors are important predictors of treatment response and survival, yet data for Hispanic/Latina (H/L) women are limited. Here we analyzed whole exome sequencing data from tumor/normal pairs and RNAseq data from 748 H/L women and 388 non-Hispanic White (NHW) women. Overall, the somatic profiles in tumors from H/L women were similar to NHW women. However, somatic mutations in genome organizer CTCF were significantly more common in ...
Show abstract
Accurate classification of BRCA1 and BRCA2 variants is essential for cancer risk assessment and therapy selection, yet over one-third remain variants of uncertain significance (VUS). Here, using 120,660 real-world cancer genomic profiles with BRCA1 or BRCA2 variants from a >800,000-sample cohort, we develop machine learning models that predict pathogenicity using clinical and tumor-derived features, including a pan-cancer homologous recombination deficiency signature, co-mutated genes, zygosity,...
Show abstract
Polygenic risk scores (PRSs) have emerged as a valuable tool for genetic risk prediction and stratification in human diseases. Over the past decade, extensive methodological efforts have focused on improving the predictive power of PRS, leading to the development of numerous methods for PRS construction. Benchmarking these various methods thus becomes an essential task that is crucial for guiding future PRS applications. While studies have benchmarked subsets of these methods on specific phenoty...
Show abstract
IntroductionAdverse drug reactions (ADRs) remain a major public health issue, and genetic factors contribute importantly to interindividual variability in drug response. Pharmacogenetic testing helps reduce ADR risk by optimizing drug selection and dosage, particularly in monogenic disorders. Material and MethodsWhole-exome sequencing of 6,739 samples from the Russian population was performed using the MGIEasy Universal DNA Library Prep Set on the DNBSEQ-G400 platform (MGI). Variants in 48 gene...
Show abstract
BackgroundKlebsiella pneumoniae is a common cause of neonatal sepsis in Africa, and is frequently hospital acquired. We recently reported an outbreak of multidrug-resistant K. pneumoniae sepsis amongst neonates at a rural hospital in The Gambia, West Africa, involving 57 cases and case fatality of 60%. Here we undertook a retrospective pathogen genomic epidemiology study of clinical and environmental K. pneumoniae isolated during the outbreak, to identify the outbreak strain, refine the epidemic...
Show abstract
STUDY QUESTION[Do structural genomic variants, that can be identified by using optical genome mapping, contribute to male infertility?] SUMMARY ANSWER[By using optical genome mapping we can identify several types of structural variants, both known and new, that may contribute to male infertility.] WHAT IS KNOWN ALREADY[Traditional approaches such as karyotyping, CFTR and chromosome Y microdeletion testing are successful in explaining clinical findings in [~]30% of MI patients, leaving the rest...
Show abstract
BackgroundPersonalized pharmacotherapy requires systematic consideration of genetic factors influencing drug efficacy and safety. The accumulation of large-scale whole-exome sequencing (WES) data provides an opportunity to assess population frequencies of clinically significant pharmacogenetic variants; however, the diagnostic applicability of exome data for pharmacogenomics remains insufficiently studied. Materials and MethodsA retrospective analysis of 6,102 anonymized sequencing datasets obt...
Show abstract
BackgroundExome sequencing (ES) has become a key diagnostic tool for rare diseases (RDs). However, most evidence on ES performance comes from high-income countries and patients from European ancestry. In countries such as Chile, limited access to next generation sequencing amplifies health disparities and highlights the need to identify which patients are most likely to benefit from ES. MethodsThis study presents the second phase of the Chilean DECIPHERD project, in which we performed ES in a n...